hindsight goal
- North America > United States > Illinois (0.05)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > Canada (0.04)
- North America > United States > Illinois (0.05)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > Canada (0.04)
GCHR : Goal-Conditioned Hindsight Regularization for Sample-Efficient Reinforcement Learning
Lei, Xing, Yang, Wenyan, Ke, Kaiqiang, Yang, Shentao, Zhang, Xuetao, Pajarinen, Joni, Wang, Donglin
Goal-conditioned reinforcement learning (GCRL) with sparse rewards remains a fundamental challenge in reinforcement learning. While hindsight experience replay (HER) has shown promise by relabeling collected trajectories with achieved goals, we argue that trajectory relabeling alone does not fully exploit the available experiences in off-policy GCRL methods, resulting in limited sample efficiency. In this paper, we propose Hindsight Goal-conditioned Regularization (HGR), a technique that generates action regularization priors based on hindsight goals. When combined with hindsight self-imitation regularization (HSR), our approach enables off-policy RL algorithms to maximize experience utilization. Compared to existing GCRL methods that employ HER and self-imitation techniques, our hindsight regularizations achieve substantially more efficient sample reuse and the best performances, which we empirically demonstrate on a suite of navigation and manipulation tasks.
- North America > United States > Texas > Travis County > Austin (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
Reviews: Curriculum-guided Hindsight Experience Replay
The paper borrows tools from combinatorial optimization (i.e. for the facility location problem) in order to select hindsight goals that simultaneously has high diversity and also being close to the desired goals. As mentioned, the similarity metric used for the proximity term seems to require domain knowledge that euclidean distance works well for this task. This may be problematic if we have obstacles that mislead the euclidean distance, or in another environment where it is less obvious what the similarity metric can be. I am aware that this dense similarity metric is only used for hindsight goals, and that the underlying Q function/policy is still trained on the sparse reward (without the bias). There are several related works that can be discussed and potentially benchmarked against in terms of hindsight goal sampling schemes: Sampling from ground truth distribution half the time for relabeling, and using future the other time (in Appendix).
Proving Theorems using Incremental Learning and Hindsight Experience Replay
Aygün, Eser, Orseau, Laurent, Anand, Ankit, Glorot, Xavier, Firoiu, Vlad, Zhang, Lei M., Precup, Doina, Mourad, Shibl
Traditional automated theorem provers for first-order logic depend on speed-optimized search and many handcrafted heuristics that are designed to work best over a wide range of domains. Machine learning approaches in literature either depend on these traditional provers to bootstrap themselves or fall short on reaching comparable performance. In this paper, we propose a general incremental learning algorithm for training domain specific provers for first-order logic without equality, based only on a basic given-clause algorithm, but using a learned clause-scoring function. Clauses are represented as graphs and presented to transformer networks with spectral features. To address the sparsity and the initial lack of training data as well as the lack of a natural curriculum, we adapt hindsight experience replay to theorem proving, so as to be able to learn even when no proof can be found. We show that provers trained this way can match and sometimes surpass state-of-the-art traditional provers on the TPTP dataset in terms of both quantity and quality of the proofs.
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
Exploration via Hindsight Goal Generation
Ren, Zhizhou, Dong, Kefan, Zhou, Yuan, Liu, Qiang, Peng, Jian
Goal-oriented reinforcement learning has recently been a practical framework for robotic manipulation tasks, in which an agent is required to reach a certain goal defined by a function on the state space. However, the sparsity of such reward definition makes traditional reinforcement learning algorithms very inefficient. Hindsight Experience Replay (HER), a recent advance, has greatly improved sample efficiency and practical applicability for such problems. It exploits previous replays by constructing imaginary goals in a simple heuristic way, acting like an implicit curriculum to alleviate the challenge of sparse reward signal. In this paper, we introduce Hindsight Goal Generation (HGG), a novel algorithmic framework that generates valuable hindsight goals which are easy for an agent to achieve in the short term and are also potential for guiding the agent to reach the actual goal in the long term. We have extensively evaluated our goal generation algorithm on a number of robotic manipulation tasks and demonstrated substantially improvement over the original HER in terms of sample efficiency.
- North America > United States > Illinois (0.05)
- North America > United States > Indiana (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)